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Abstract—Consider a linear system subject to stochastic distur- 
bances and a path to be followed by a system’s output. The path- 
following problem is posed here as choosing both the control input 
and the speed along the path to minimize the expected value of a 
quadratic function of the control input and of the error between 
the output and the resulting trajectory. The optimal control input 
policy for the deterministic version (no stochastic disturbances) 
is first provided and shown to be the sum of linear state feedback 
and path-dependent components, as for the twin linear quadratic 
trajectory-tracking problem. This policy is proven to also be 
optimal for the original stochastic problem when the path is a 
straight line. For general paths, it acts as a certainty equivalent 
policy that is shown to improve the cost of the optimal trajectory- 
tracking policy for any given trajectory, both when it can be 
exactly computed and when proposed approximate methods are 
used otherwise. 


Index Terms—Linear Quadratic Control, Path Following, Op- 
timal Control, Stochastic Control 


I. INTRODUCTION 


n many control applications, rather than having a system’s 
Lo track a time-parameterized reference, it is more 
important to minimize the distance of the output to a given 
path while meeting soft time constraints, such as the desired 
speed. These applications include machine tooling (e.g., CNC, 
laser profiling) [1]-[3], robotic manipulators [4], [5], and 
autonomous vehicles [6], [7]. Path-following addresses these 
problems, where precise trajectory-tracking in a timewise 
sense is sacrificed for output accuracy. It typically leads to 
time-invariant motion control policies, consistently showing 
superior results to trajectory tracking in the mentioned appli- 
cations, requiring less demanding actuation [8] and avoiding 
corner cutting [9]. The price to pay is challenging nonlinear 
and non-convex optimization problems [2], [10]-[12], even 
for linear systems, with few results formalizing improvements 
with respect to trajectory tracking [10], [11]. 

A popular formulation in the autonomous vehicle com- 
munity, refers to path-following as an online control law 
addressing two tasks [8]: (i) reach the desired path as a 
function of a so-called path variable, and (ii) satisfy an 
additional dynamic specification, often through controlling the 
path variable. A prime example of the path variable is the 
longitudinal position of the vehicle along the path obtained 
through a numerical projection [6], [7], [13]. Similarly in 
spirit, but geared towards machine tooling applications, some 
approaches aim at reducing the countering error, which is the 
output deviation from the desired contouring path [2]. These 
approaches include cross-coupled control [1] and model pre- 
dictive control (MPC) [2], [3]. Another problem formulation, 
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popular in the machine tooling and robotic manipulation areas, 
is motivated by a suboptimal two-step approach for handling 
time-optimal motion planning problems [4]: a geometric path 
is first computed to meet task-specific requirements (e.g., ob- 
stacle avoidance) and then a time-optimal trajectory along the 
geometric path is computed, taking into account state and input 
constraints. It has been tackled with several approaches, such 
as dynamic programming [5], and several assumptions, some 
of which remarkably lead to convex problems [4]. While path- 
following in this sense refers to an off-line planning problem, 
online control methods compatible with this off-line planning 
have also been proposed [14]. Additional ideas are the online 
trajectory time scaling [15] and the choice of trajectory speed 
online [12], [16]. While these ideas have mainly been used 
in the context of time-optimal problems, an error optimization 
approach is followed in the present paper. Only a few articles 
followed this approach, see [17] considering nonlinear robotic 
systems and [12] considering MPC, but derived very different 
results. Also related, [10], [11] consider an error performance 
index and use (instead of optimizing) the path speed freedom 
to show that the limitations of trajectory tracking for non- 
minimum phase systems are not present in path-following. In 
all these articles, process disturbances are not considered; here, 
stochastic disturbances are considered and a set of new results 
is provided. 

The present paper considers a linear system subject to 
stochastic disturbances and a cost penalizing the error between 
an output of the system and a trajectory along the path 
whose speed can be chosen online. The trajectory’s speed and 
control input policies aim to minimize the expected value of 
a quadratic function of this error and of the control input. 
The paper provides three main contributions. Considering first 
the case where disturbances are not present, it shows that the 
optimal policy for the control inputs can be decomposed into 
linear state feedback and path-dependent components, as for 
the twin linear quadratic trajectory tracking problem. Second, 
it shows that this policy is also optimal when stochastic 
disturbances are present, provided that the path is a straight 
line, i.e., certainty equivalence [18] holds; here, the expected 
quadratic cost can be exactly computed based on Riccati 
equations. Third, it proposes to use this policy as a certainty 
equivalent policy [18] when the path is non-linear and stochas- 
tic disturbances are present and provides an algorithm building 
upon gradient search and MPC that guarantees an improved 
cost with respect to the optimal trajectory tracking policy. The 
crucial enabling fact for these three results is that the cost 
of the optimal trajectory tracking policy admits a closed-form 
expression under the assumptions of linearity and no input and 
state constraints. These strong properties are not guaranteed 
by more powerful and general-purpose methods that do not 
need these assumptions (e.g., MPC). This and other facts are 


illustrated through numerical examples. 

The remainder of the paper is organized as follows. The 
optimal path-following problem with trajectory speed opti- 
mization is formulated in Section II. The main results and 
algorithms are given in Section HI. A numerical example 
is provided in Section IV. Section V presents concluding 
remarks. The proofs of the results are given in the appendix. 


II. PROBLEM FORMULATION AND PRELIMINARIES 


Consider a discrete-time system 


£k+1 = Ate + Buk + we 
(1) 
Zk = C1Tk 

where, x, € R” is the state, z, € RP is the output, and 
ux E€ R™ is the input at time k € H := {0,1,...,h — 1}, 
over a finite-horizon h € N. The disturbance inputs wk 
are assumed to be zero-mean independent and identically 
distributed random variables with covariance W = E[w,w, | 
for every k. A spatial path y(s) € RP parameterized by 
s € (a,b) with a € RU {—co} and b € RU {co}, a < b, is 
considered with continuously differentiable y. The output zę 
should follow the spatial path y(s) considered in the interval, 
s € [s, 5], with s > a and 5 < b, in the sense that the errors 


zh — (5) (2) 


min. ||zk — y(s)||,& € H, 


s€[s,5] 


are small (ideally zero). This is formulated as follows. Let 


Sk+1 = Sk Uk, KEH, (3) 


with so = s so that the sampled reference/trajectory takes the 
form rx, = (sk), k € HU {h} for some control inputs 
vp. These are denoted by (trajectory) speeds, although they 
do not necessarily correspond to physical speeds. Moreover, 
let €, = [xk spl The path-following problem is posed 
as the following stochastic optimal control problem: find 7 = 
(Tu, Ty) Where 7, = {Ho, H1,- --, Hh—1} is a policy for ug = 
uklêk) € R™ and m, = {00,01,.--,0n—1} is a policy for 
Uk = On(Ex) E Wk in order to solve 


h-1 
LS M2 7(se)leHl ox (Ex) FeHl 2-7 (80), 


k=0 

(4) 
for (1), (3), subject to sp = 5, for positive semi-definite (tuning 
matrices) Q, Qa and R, where lalla := a'Qa for vector 
a. The first term of the running cost in (4) together with the 
freedom of controlling sọ captures the first goal in (2) whereas 
the terminal cost in (4) together with the constraint sp = 5 
captures the second goal in (2). As usual, a penalty on the 
control input is added for regularization. Sets VW; play an 
important role. Three important cases W = W*, £ € {1,2,3}, 
specified next for k € H \ {A — 1} are: 


Wi = (0,5 — Sk] 


Jo(éo) =min 


(PF-C) 


constraining the path variable to be monotonically increasing, 
S < Sk < Sk+1 < 5, in the spirit of [10], [8], 


W? = [a — 54,0 Sx], (PF-NC) 


imposing only that s < sk < 5 when a = s and b = 5 and 
imposing no constraints when a = —oo and b = o and 


wÈ = {Uk}. 


capturing trajectory tracking with vk = Ux for given constants 
Üp, k € H. For all the cases £ € {1,2,3}, W£; = {5—sp,_1}. 
While it is convenient to restrict vz, to a neighborhood of 
nominal values vg E€ Wk N [Uk — €, Uk + €], for some € > 0 to 
avoid large intervals Sk}1 — Sk corresponding to generated 
trajectories far from intended, this is not pursued here for 
brevity (see Remark 1 for further discussion). 

Before moving to the main results, the optimal policy for 
the trajectory tracking problem is discussed. 


(TT) 


A. Trajectory tracking 
For each k € HU{—1}, it is convenient to define the future 
T 
reference pp = [rfi ripo ae oT, | € R”P, with ng = 
h — k. In the case of trajectory tracking, i.e., when Uk = Uk 
for fixed Uk, k € H, it is given by 


Pk = qk(Dk, Sk), qk(Dk, Sk) = Fn, (hes Sk)), 
h(x, Sk) = Fn, Dk + Skln, 


(5) 


Dk = [Dk Uk+1 n1] € R”*, and Fax E Rex 
An, (a) : R"* > R”*? given by 


1 oO... 0 (a1) 
Hen Eo cte eee 0) ao (a2) 
a arn (an) 
In such a case, an optimal policy 7, that minimizes (4) is 
Up = Kkzk + Lepr, k EH, (6) 


and the resulting optimal costs-to-go (defined similarly to (4) 
but with cost starting at a given k rather than 0) are 


JE (Ek, Pe) = £h Pep +20p, Nkpk-1+pp-1Mkpk-i+dp (7) 
where, for k = h, 

Pr =C] QC, Nn =-Ci Qn, Mn=Qn, dn =0, 
and, for k E€ {h—1,h—2,...,0}, 

P, =A! Pui A+ CI QC, 

—A' Ppa, B(R+B! PyiiB)'B! Py A 
Ky=—(R+ B! Py B) B| PyyiA 

Ly=—(R+ B! Pi B)'B" Npp 
Ne= [-C7 Qar (A+ BKe)' Ness] 


Mpk = Q 0 
i 0 Mr+1—-Nl 1 B(R + B! Pk41B)'B' Negi 
h 
dk = 5 tr( PW) 
f=k4+1 


with (-)' and tr(-) denoting the pseudo-inverse and the trace. 

This fact can be obtained by combining standard results 
for discrete-time preview control [19] and stochastic optimal 
control [18, Vol 1, Ch. 3] and relies on the independence of 


the wz. Policy (6) is optimal both when wp = 0 for every 
k € No := NU{0} (W = 0) and when stochastic disturbances 
are considered; hence certainty equivalence holds for this twin 
trajectory tracking problem (according to [18, Vol I, Sec 1.3], 
certainty equivalence holds if ‘the optimal policy is unaffected 
when the disturbances are replaced by their means’). 


III. MAIN METHODS AND RESULTS 


Section III-A provides the optimal policy for a deterministic 
version of the path-following problem (4). Section HI-B shows 
that this policy is also optimal for the stochastic version when 
the path is a straight line. In Section III-C, the performance of 
this policy and of a proposed approximate methods is shown 
to improve that of any given trajectory-tracking policy. 


A. Optimal policy for deterministic version 


The first result considers wz, = 0 for every k € H, in which 
case the expected value in (4) can be removed. For a vector 
a € R”, a > 0 (a < 0) indicates that all the components are 
non-negative (non-positive), i.e., a; > 0 (a; < 0) for every 
i; In, lm, Omxn, Om denote the n x n identity matrix, the 
row vector of n ones, the m x n matrix with zero entries 
respectively, and Om = Om xm. Dimensions are often omitted. 


Theorem 1. Suppose that wẹ = O for every k € H for 
system (1). Then, an optimal policy for (4) is 

Kptp + Lepp(Sk), & eH, (8) 
[1 O1x(ng—1)] Ve (Ex) (9) 


where p% (Ek) = qk (Vý (Ex), Sk) is the future reference result- 
ing from the following speeds 


Ut = 


Uk = 


vee) E arg min f(x, Vx) (10) 
VREVE 
felEk Vk) = 2a} Neag(ve, sk) +48 (Vk, Sr)! Mua? (Vk, Sk) 


with q} (Vk, Sk) = [y(se)" dk (Vk, s) "] and Vi is defined 
as follows for each possibility for sets Wf, £ € {1, 2,3}: 


Vi = {v e R™|l) vy =5-— sk, v >20} (PFC) 
Vg = {v E€ R™ |L, v = 5 — Sk, aln, < Av, Sk) < bln, } 
(PF-NC) 
V3 = { (Ùk, ...,Uh—1)|0n-1 = 5 — sh-1} (TT) 
for k € H\ {h — 1} and Vf_, = {5 — s,-1}, l € {1,2,3}. 


The optimal policy might not be unique since, e.g., the 
optimization of fk(Ek, Vk) with respect to vę might have 
multiple solutions. Function fg is such that JIT (£k, vk) = 
x} Pete + fe(k, Vk), so that the minimization with respect 
to v, can be interpreted as minimizing the cost-to-go of 
trajectory-tracking with respect to the trajectory itself along 
the path or, equivalently, the speed inputs. 

Note that the optimal policy has a similar structure to the 
one of trajectory-tracking (under (TT), it naturally matches (6)) 
consisting of the sum of linear state feedback and path- 
dependent components. However, the latter component now 
depends on the state and results from the speed optimization 
problem (10). Sections III-C discusses how to solve (10). 


B. Straight lines 


Suppose that the path is a straight line described by 


V(s) =@+Xx8, —00< 8 < 00 (11) 
for given vectors ¢ € R? and y € R? \ {0} and for 
a = —oo and b = œ, and that the path variable is not 


constrained, i.e., (PF-NC) holds. Then, as shown in the next 
result, policy (8), (9) boils down to 


uk = Kyap t+ Lelo + x5) (12) 
= T 
vp = Ik Ex Tk ri] (13) 


where Ty, = —(1J MHo) [I] NI Wy MI], Kni = 
Kn—ı and Ly,-1 = Lp_y and, for k € {h — 2,h — 1,...,0}, 


Ki = Kp + [Omxp Lx] Hork b n 
2pxn 
- A On 
Lk — lOmp L] (Iı + ILI% | a 
2p 
are computed iteratively for k € {h — 2,h — 1,...,0} from 


the Riccati equations 
Pa = Pah, Nn = Nn, Mr = Mhn 
Pui = Pra; Nn- = Nn, Mn- = Mni 
Py = A Pp At+Q 
— ATP. B(R+ B' Py BB Py A 


=[-C]Q (A+ BK)" Nez] 


k 
ma(|? —_ : 0 2 
k> jo Mryi — Nl B(R+ B' Py B)'B" Nkya 
Ky, = —(R+ B’ P41 B)'B' Py A 
L= -(R+ B! P41B) B! Ney 
Pe N) f| R NI 
Ng Mg) I N II Mpi 
NÑ,Il T s y 
- [ip idn] OS Man SE pv 


As also shown next, policy (12), (13) is also optimal for the 
original stochastic problem, i.e., certainty equivalence holds. 


Theorem 2. Policy (12), (13) is an optimal policy for prob- 
lem (4) when the path is a straight line described by (11) 
and the path variable is unconstrained, i.e., (PF-NC) holds. In 
particular, (12), (13) coincide with (8), (9) when wk = 0 for 
every k € H and both provide unique values. Moreover, the 
resulting optimal cost JEF (£o) = Jo(£o) is given by 


a P N 0 
P= RT ee ota] gy i ; 
No Moll (g + xa) 


Furthermore, JEF (£o) < JET (£0, Do), for every £o = (zo, 80) € 
R”+! and any given Dk = (Ùo, ..., Ūn—1) € R”, with d,_1 = 
(5 = Sh—1): 


The case of a linear path yields the strongest results as one 
can exactly quantify the cost gain when using path-following 
versus trajectory-tracking as both JEF (£o) and J&T (£o, Do) 
admit closed-form expressions. Note that while (4) is quadratic 
in the state ép = [aq Sp] T when the path is linear, the proof 
does not follow trivially from the standard linear quadratic 
control result [18] due to the terminal constraint sp = 5. 


C. Cost improvement and approximate methods 


While policy (8), (9) is, in general, not optimal for (4), 
it can still be applied as a certainty equivalent policy. This 
section shows that it can only improve the cost of the optimal 
trajectory-tracking policy (6) for any given trajectory. The 
result does not need (10) to be solved exactly, but rather 
approximate solutions, denoted by v,, k € H, to be found that 
satisfy the conditions discussed next. Let 7, = wy (Ek), Ük = 
[ox Urpi +. .On-1| $ € VE, for a given £ € {1,2,3} denote 
the retrieved solution by the approximate method at time k and 
let A, : R”! —> R”*+1, AR (DH) = [õk+1 teal be maps 
that extract the tail of % by removing the first component 
Ùk. Note that y, is, in general, a function of the state £x, 
and ùp is used to express this dependency. Given a fixed 
trajectory characterized by Po € Vj, and associated with a 
trajectory-tracking policy to be compared with path-following, 
the approximate method should satisfy the following two 
conditions: 


folo; o) < fo(&0, Do), V&o Ee R”+, 
Fic (Ses WklEk)) < frlEk, Ak—1(Vk-1(Ek-1))), 
VEK-1 E€ R” wea E R”, ke H\ {0}. 


Intuitively, condition (14) requires that, at k = 0 and for every 
initial state £o, the approximate algorithm retrieves a trajectory 
that leads to a non-larger expected cost than that obtained 
with trajectory-tracking characterized by vo. This connection 
can be seen by noting that cost (7) and function fp in (10) are 
related by JIT (£p, ve) =a) Perr + fr(Ex, Vk) +d. Moreover, 
condition (15) requires that, at time k, the approximate method 
retrieves a trajectory that leads to a non-larger cost than that 
associated with the tail of the trajectory computed at the 
previous time step k — 1. These conditions are naturally met 
when (10) can be solved exactly (as in the case of straight 
lines, see Section III-B) and j% = vý. An approximate method 
that satisfies (14), (15) is provided in the sequel. The resulting 
approximate policy is 


Kyte + Licpr (Sk), ke H, 
[1 01x (nx—1)] Wr (Ex) 


where fx, (€x) = dn(W (Ex), Sz) is the reference resulting from 
the speed profiles provided by an approximate method that 
meets (15). The mentioned result is stated next. 


(14) 
(15) 


(16) 
(17) 


uk = 


Uk = 


Theorem 3. Suppose that an approximate method for (10) 
satisfies (14), (15) for a given Do € ye and for a given £ € 
{1,2,3}. Then the cost (4) of policy (16), (17), denoted by 
JF (£o) for initial condition ĉo € R"+!, satisfies 


JE" (£o) < JE (£o, Do), for every ĉo € R"t?. 


One method that meets (14), (15) is the following gradient- 
based method, also inspired by MPC. The idea behind this 
method is to optimize the trajectory velocities only in a horizon 
h and consider velocities at times k € {0,1,...,h — 1 — h} 
taking the form 


x 5 — = T 
Dy = [Vp oe) Upip—o Upth—1 Ukk +: Ün—ı] (18) 


Upah] if a vector of free vari- 
ables to be optimized, 7,47, = [Pek Tra)" are the 
trajectory speed variables of the trajectory-tracking policy one 
wishes to improve, and %,, 71 is set to Op 4,1 = 8 — Sk — 
17 +AT 1! v, to ensure that the initial part of the trajectory, 
which is optimized, ends where the final part, which is fixed, 
starts. At times k € {h—h,...,h—2}, the considered velocity 
profiles take the usual form fk = [Ur . Up brth] 
where now V, = [uk pai Ura] are decision variables and 
Ùk+h-1 = 5—sp—1'y,. At time k = h— 1, vp_1 = 5— Sh—1. 
The method assumes that a given trajectory is available that 
corresponds to Do € V§ and works under any assumption (PF- 
C), (PF-NC) corresponding to @ € {1,2, 3}. 


where v, = [vy 


Algorithm 1: For each k € H, given state £, make ô? = 
Do € Vb if k = 0 and 0° = \p_1(Hp_1) otherwise. Then: 
Iterate: Prt! = Gy (Ex, 0%, €"), for r € {0,...,7 -1} 
Retrieve: Dj, = Dj, 
where, for Ôx taking the form (18) and 
Uy — EV, fe(Eks Ôk) 
5—s8,—1' 4-1! (¥,—€Vy, felér, Pe) 
Dk+h» 
ifk € {0,...,h— h- 1}, 


âr (Ee, Pr, = Vp — EVv, Fe (Ek: Pk) 
5— sp — 1! (vp — EV v, frlêr, x))| 
if ke {h—h,...,h— 2}, 
5— sh- ifk=h-—1 
with 


ee argmiNn, ejo, nr] fk (Ek, 4k (Ek, Vk n)) 


Mm = sup{B|ax (Er, 27,8) € VEJ- 
B20 


The following proposition states that in fact condi- 
tions (14), (15) hold if Algorithm 1 is used. 


Proposition 1. The outputs 7, k € H, of Algorithm 1 
satisfy (14), (15), if Do € vé, for each £ € {1,2,3}. 


Note that the approximate method in Algorithm 1 and the 
exact method in Theorem 1 involve searches over speeds 
independently of n and thus can be used for large dimensional 
systems. This is illustrated in the next section. 


IV. NUMERICAL EXAMPLES 
Consider a model consisting of two double integrators 
modeling the dynamics along two spatial coordinates, denoted 
by pı and pə. This is captured by (1) with 


_ fA, 0 [B 0 Hooo 
afi S bee (4 01 $ 
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Fig. 1: Comparison between path-following and trajectory- 
tracking for two initial conditions x = [-3.2 0 0 o} 
and z= [0 0 -1 0ļ' 
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Fig. 2: Simulations with disturbances 


A, = A(T), Bi = fo A(s)dsBe, A(s) = ete, Ae = i 1 i 


Be = [0 1| and 7 = 0.2. For cost (4), the following 
matrices are considered Q = In, Qn = 105, R = 0.001 Io. 
The simple path (s,arctan(s)), s € (—3,3), with 5 = 
3 is first considered. Path following is to be compared 
with trajectory-tracking with constant speed ve = ( fa + 
(+ arctan(s))?)ds)/2 = 3.342 such that the path if followed 
in 2 seconds. Here, s is not the arc-length, which simplifies 
the parameterization, but results in non-constant values vz. 
Considering h = 10, these are computed from Sk+1 = Sk +Uk, 
so = —3, and ra + (£ arctan(s))?)ds = ve. The 
resulting rą are depicted in red in Figure 1(a) and (c). First, the 
results with no disturbances w = 0 for every k are discussed. 
Figure 1(a) and (c) plot the results obtained with the fixed 
trajectory with constant speed ve using trajectory-tracking for 
two different initial conditions 7) = [-3.2 0 0 oj and 


= [0 0 -l Oo}. Both the sampled output zz, and 
the output z(t) are plotted. Figure 1(b) and (d) show the 
same plots but for the trajectory obtained from Algorithm 1 
with h = h, which approximates the solution to (10) and is 


|” 


run at the first iteration k = 0. For all numerical examples, 
parameter 7 in Algorithm 1 is set to r = 20. The same 
values are obtained with any choice of W£ (and resulting V£), 
(PF-C), and (PF-NC). The plotted z(t), the sampled z;, and 
sk are not the actual values obtained by running the system 
according to Algorithm | since Algorithm 1 recomputes a new 
trajectory at the following time steps. Thus, they are rather 
predicted values of the state when wọ = 0 at time step k = 0. 
However, this shows that the predicted values have a much 
more sensible behavior than those obtained with trajectory- 
tracking since the reference is adjusted based on the state. 
The trajectory-tracking cost (4) with xj is 3.6694, and the 
predicted cost after running Algorithm 1 at k = 0 is 3.1819. 
The predicted cost at each stage k € {0,1,...,9} obtained by 
rerunning Algorithm 1 at these stages are for k € {0,1,2,3} 
{3.18194, 3.17381, 3.17333, 3.17329} and almost identically 
for k € {4,5,...,9} and equal to 3.17328. This illustrates the 
decreasing property that leads to Theorem 3 in the absence of 
disturbances. These predicted costs take into account the cost 
so far and the cost predicted with the new trajectory computed 
at the corresponding stage. The cost of the proposed policy 
coincides with the value plotted at k = 9, 3.17329, since in 
such a case Algorithm 1 has been run for all the stages. 
Consider now, for the same example, white noise zero-mean 
Gaussian white noise disturbances in (1) with a diagonal co- 
variance matrix W with diagonal entries [0 0.25 0 0.25]. 
Figure 2 shows a sample trajectory of path-following (obtained 
by running Algorithm 1 for all the stages) and trajectory- 
tracking in the presence of these disturbances. After 100 
Monte-Carlo simulations the following costs are obtained: for 
trajectory-tracking 3.8742 and for path-following 3.3125. The 
theoretical cost for trajectory-tracking (7) is 3.8814, close 
to the one obtained with Monte-Carlo simulations. If the 
trajectory would be a straight line with the same start and 
end points, i.e. a straight line connecting (—3, arctan(—3)) 
and (3, arctan(3)) the costs would be 3.3256 for trajectory- 
tracking and to 3.1241 for path-following (computed with (7)). 
The difference between the latter costs is not very different 
from the difference between the costs for the original path, 
showing that the cost of the approximated certainty equivalent 
policy is supposedly close to the cost of the optimal policy. 
A more complex trajectory with intersections is considered 
next and it is defined as follows 7(s) = [s—3 oJ" if 
s € [0,3), y(s) = [1—cos(s—3) sin(s—3)]' if s € 
[3,3+27), y(85) [s — (3+ 27) O]' if s € [3+27, 12.5]. Al- 
gorithm 1 with horizon h = 4 and Vo = Ve [1 | eee i)", 
Ue = 0.5/7. is now considered with h = 25, and the same 
parameters (model and noise) as for the previous example were 
used. The results of path-following are shown in Figure 3. Note 
that the path-following algorithm is able to follow this more 
elaborate path with intersections successfully. The fact that a 
short horizon was chosen was actually beneficial in this case 
(see Remark 1 below). The average cost for path-following 
obtained with 100 Monte-Carlo simulations is 0.717 and of 
the trajectory-tracking policy (associated with vo) is 1.0776. 
Let us now compare these costs with that of a standard 
MPC approach. Note that Algorithm 1 is inspired by MPC, 


but it crucially takes into account the cost of the trajectory 
tracking approach after the receding horizon. This can be 
seen as a special terminal cost, which leads to the perfor- 
mance improvement property (Theorem 3). As this is the key 
difference, we use the same Algorithm 1, but now setting 
this terminal cost to zero. This amounts to considering a 
different fk, which can still be written as in (10), but now 
with % = |v} Dern] € R* and Qn = 0. This leads 
to a mean cost of 0.7807, still smaller than that of trajectory 
tracking. If we simply change R to R = 0.01J2 we obtain 
the following costs: (i) for trajectory tracking, 6.232; (ii) for 
the proposed method, 5.1029 ; (iii) for MPC, 9.980. The cost 
of MPC is now worse than that of trajectory tracking, which 
highlights the key advantage of the proposed method. 

In order to illustrate that the proposed policy scales well 
with the dimension of the system n, we consider a fifth 
order integrator rather than a second order integrator. This 
04x1 I, 

0  Oix4 
and Be = [01x4 ie, To obtain reasonable trajectories the 
weighting matrix R has been set to R = 0.000001/2. The 
results are shown in Figure 4 and are similar to those of 
the previous case shown in Figure 3. The mean run time of 
Algorithm 1 over the h = 25 times steps it is called is 0.5425 
for the double integrator (n = 4) and 0.4503 seconds for 
the fifth order integrator (n = 10) showing that the proposed 
method scales well with n. It is surprising that with n = 10 
the meantime is lower, but note that a different model and cost 
parameters are used. This was achieved on a MacBook laptop 
with no concern in optimizing the computations; a real-time 
implementation needs to bring the computation time below the 
sampling period tT = 0.2, which can easily be achieved. 


amount to changing the expression of A, = 


Remark 1. For the path with intersections depicted in Figure 3 
consider a simple linear trajectory defined by sk = 12.5522 k, 
k € {0,1,..., 12}, sp = 2r + 5k, k € {13,14,..., 25}, 
which skips the loop and it is then equivalent to a simple 
straight line. The cost of trajectory-tracking for this trajectory 
is 0.3521, much smaller than the one obtained with path- 
following 0.717 (which optimized the trajectory but only in 
a short horizon, and thus did not find this low-cost trajectory). 
While better in terms of cost, the behavior is far from the 
intended one obtained with a short horizon h = 4, beneficial 
in this case. As mentioned in Section I, an alternative would 
be to constrain the speed sets, in which case the large value 
V12 = $13 — S12 would be avoided and the loop not skipped. 


V. CONCLUSIONS AND FUTURE WORK 


This paper provides a framework and a set of results 
for path-following with stochastic disturbances that parallel 
those of the linear quadratic trajectory-tracking framework. 
Path-following is seen as a broader problem than trajectory- 
tracking, as it optimizes the trajectory speed online to enhance 
performance. Performance is measured by a quadratic cost 
penalizing output deviations from the path and input effort. 

Some assumptions can easily be dropped. For example, to 
account for non-zero mean disturbances wy, it suffices to add 
the auxiliary state £w 441 = Zw,k With £w o = E[w,] and set 
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Fig. 3: Output when path-following is used for a four dimen- 
sional system and a complex trajectory with intersections 
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Fig. 4: Output when path-following is used for a ten dimen- 
sional system and a complex trajectory with intersections 


Wk = Wk — Tw,k- Dropping the assumptions of linearity and of 
no input or state constraints (e.g., to account for obstacles in 
the environment) is harder since no closed-form cost expres- 
sion for the trajectory tracking case is known in general (which 
plays a crucial role in Theorems 1, 2, 3 and Proposition 1). 
Still, some ideas can be reused. For instance the gradient 
search in Algorithm 1 can be replaced by a direct search 
over the speeds in the horizon that rely on trajectory tracking 
simulations for computing costs. Determining if Theorem 3 
and Proposition 1 still hold warrants further research. 


APPENDIX 
Proof of Theorem 1 


The result follows from applying the dynamic programming 
algorithm [18, Vol. 1] to obtain the optimal policy. Let 
In(Er) = lien — VsallQ, = tA Pata + 22, Nnhpn-1 + 
P11 Mnpn—1 with sn = 3, 


In—1(En—1) = min ||2n-1— lsn- )llo + lun—allie + Jn (En) 


= 21 Ph—1tp—1 + 20,4 Nn—1Pn—2 + Ph 2Mn-1Ph-2 


be the costs-to-go at stages h and h — 1, which coincide with 
that of trajectory-tracking (7) since vp—ı and rp = y(sp;,) are 
fixed. The minimizer is up-1 = Kr-1£h-1 + Ln-17(Sn), 
which coincides with (8) for k = h — 1. However, the cost-to- 
go at stage h — 2 is different from that of trajectory-tracking: 


In—2(Er-2) = min, min ||zn-2— 7(8n-2))[lQ+llun—allz 


Vh— 2EWf_ 2” 
a 
+ In-1([(Atn—-2 + Bun-2)! 8n-2 +Tun-2] ) 


; T T 0 
= min £p p—2Ph-2£h—2 + 24), Nn—2Gj,_-2(Van—2, Sh—2) 


Un-2EWF _» 
+ Gp _(Yn—2; 8n—2) ' Mn—29?_9(Vn—2, Sn—2) 


fhr—2(En—2, (Vn—2; Vn-1)) 
fhn—2(En—2, Va—2) 


min 
Un—-2EWp _5 


= £p_9Ph—2Ep—2 + 


min 


= £p_9Ph—2Ep—2 + 
Vn—2EVh _5 


The minimizer that leads to the second equality (which follows 
from the trajectory-tracking solution as vp_2 is assumed 
fixed while minimizing over up—2) iS Up—2 = Kh—-2£h—-2 + 
T 

Lro [y(sn—1) 7(sn)] where Sh—-1 = Sh—2 + TUn—2 and 
Up—2 is the minimizer that leads to the last equality and 
coincides with (9). The last equality used the fact that v,_1 = 
S=Sh=1 and Un—2 €E Wr» implies yp_2 = (Un—2; Un—1) = 
ye, for £ € {1,2,3}. In fact, if = 1, i.e., under (PF- 
C), un—2 € (0, =|, which is equivalent to that vp-2 = 
(Un—2,Un—-1) E Va-2 = {(Un—2,Un—1)|Yn-2 > 0,Un-1 = 
0, un-2 +ur-1 = 22}; if £ = 2, vna E [2 Eth] 
is equivalent to vn—2 E {(vh-2, Un—1)|S < Sh-2 + TUn—-2 < 
5,8 < Sh—1 +TUh—1 < 5 and vp—2 + vp_-1 = —*}. This 
fact is trivial if 2 = 3. Assume now that 


Dla Se No 


sleet won), 


min 
Vet 1EVe4y 


Tesi (Eeri y= 


(19) 


min  fk41lêk+1, Vk+1) 


T 
— Cpr Pep Ley + P 
Vr+1EVe ry 


and that the optimal policy is given by (8), (9) with k replaced 
by & +1. It is now established that this is also true for k, i.e., 


Jx(Ex) =min min ||z~e— (se) lO + llualle + Teta (E41) 
VREWE Uk 

and the optimal policy at time k is given by (8), (9). Replac- 
ing (19) on the right-hand side and switching the minimization 
operations with respect to uz, and vg, leads to 


h-1 


min, | min > læ- V(se)lloHlue(Ee)llzt+ len -=l 
VREVE Uks aes 


= xg Pptp + min fklék, Vk) 
VREVE 


which, by hypothesis, is J;,(&), and where the fact that 
UR € Wr and vp41 € Via is equivalent to vk € vt 
for every € € {1,2,3} was used. In fact, if £ = 1, 
Uk € [0, SS] and Vet € _ tetas Dia) > 
0,...,Upn—1 > 0 DDE k41 Ur = StL} j is equivalent to Vk € 
N ,Uh— 1)|Uk > 0,...,Un-1 > 0 oye 1 Ur = E if 
£ = 2, vp € [5 5] and Vert © { (0k41; -< Uh- lS < 


Sr+1 + TUr41 < 5,Vr E€ {k + 1,...,h — 1}, pa ig Uk = 
Soset) is eee to Vi. € Le: ES aa < 8,+Tup < 
5,Yr € {k,...,h—-1}, "oh vk = 5s: }, This fact is trivial 
if £ = 4. Note pa in (20), in the inner optimization with 
respect to Uk,...,Uh, Vk is fixed and hence the trajectory- 
tracking solution can be applied. The minimizer upę is then 
given by (8) at time k. Moreover, the minimizer for vk is 
given by (9) at time k. This concludes the proof. 


Proof of Theorem 2 


The result follows from applying stochastic dynamic pro- 
gramming [18, Vol. 1], optimizing at each step over ux and vz. 
As in the proof of Theorem 1, the costs-to-go at stages k = h 
and k = h—1 and policy at time k = h—1 coincide with those 
of trajectory-tracking as 5 is fixed. Hereafter, the quantities 
for the cost-to-go in this linear case are denoted By a bar on 
top, i.e., Jn(En) = En Patn + 22) Nirpr—1 + Ph- iMhpr-1 
ang Jn-1(En-1) = Ti Pa-1En-1 + 2x} 1Nn-1Ph—-2 + 
Pn—2Mh—1Ph—2 + t(P,W). and up. = Kp—1ta—-i + 
Ln—1y(sn) where for k = h — 1 and k = h these coincide 
with the quantities in (7) and in (6) without the bar on top 
(this will not be the case for k € H\ {h —1})). Note that the 
assertion on the policy then holds for k = h—1. Assume that 
the cost-to-go at time k + 1 is 


7 Pepi Nepi age 
Fra Sep1)= [vi Ted ak M Tk+1 | +Ck+1- 
k+1 k+1 rh 


where Troi = Y(sk+1) rh = (ọ + x5) and ck} = 
5? t=k+2 tr(PeW), and that the optimal policy at time k + 1 
is (12), (13). This holds at k + 1 = h — 1. Then, the aim is to 
show that the cost-to-go at time k takes the same form (with 
k + 1 replaced by k) and the policy at time k is (12), (13). 
From the stochastic dynamic programming algorithm (which 
implicitly uses the assumption that the wx are independent, by 
considering £x as the state) the cost-to-go at stage k is 


min min, [|e VCs) + lluell® + Elet: (Gx+1 IEA: 


Since w, are zero-mean and letting ck = tr(PriiW) + Ck41, 


[Jeti (k+ )lEk] = Ckt 


> Pepi Nepi Vee 
Ar, +B y 
[(Aze un)" Thea Ta] Ee Pa ek 


with rp41 = rptup. Taking first the minimization with respect 
to ux for fixed vk, Tk+1, results in the policy and cost-to-go: 


Uk = Kyrx + Ly ri ri] (20) 


T P, Ñ z 

TaT k Nk k|} 21 
ia e Peal let e e E © OP 
where pf_=[r} rfar] =fr} r} ] Hv. Taking the 
minimization with respect to v; results in (13). which replaced 
in the last expression and then in (13), in (20) and in (21) 
: Pr Ny 
results in Ea r} r] Ne My 


which is Ix (@k)s and Uk = Kvn + Lr ley 


[ae re | ee. 


a(sn)] | as 


desired. In particular, the cost of this optimal policy is Jo(&) 
which coincides with JẸF (ĉo). Note that this control input 
policy is the same for any value of W, and, for W = 0, 
Ck = 0 for every k. Then this control policy must also coincide 
with (8) for any W > 0 and vz given by (13) must coincide 
with the first component of vý computed by (10), when both 
policies provide unique values for a given state £x; they still 
provide optimal u, and vz where these are not unique for 
a given state Ek, but not necessarily coincide. Performance 
improvement follows from Theorem 3, provided that (14), (15) 
hold. Recall that u, and v, can be computed from (8), (9). 
Condition (14) holds since, for any £o, the search space in the 
minimization in (10) includes any given Uz, k € H; (15) holds 
since, for any £p (resulting from &, —1, wk—ı and considered 
policy), the search space in the minimization in (10) at k 
includes the tail of the optimal trajectory computed at k — 1. 


Proof of Theorem 3 

Consider a family of policies 7’ = (m4, 7¢,), 6 € {0,...,h} 
where mi ={16,---5 1, 1}. Th ={06,---,)_1} are policies 
for up = Hi (Ek), UR = oh (Ex) € WE defined as: for i = 0, 1° 
is the optimal trajectory-tracking policy with fixed Ux, k € H. 
For ¿ € H \ {0}, 2’ coincides with policy (16), (17), for 
k € {0,...,0—1}, and, for k € {u,...,h—1}, it is the optimal 
trajectory-tracking policy when the trajectory is fixed to the 
tail of the one computed in the previous time step £ =1.—1 
(ie = [Ge Perr en] ), ie; Uk = Di1 k Uk = 
Kktk + Lepr, ke {t, t+1,...,h— 1}, with k = qr (Õhk, Sk) 
and 1%, the tail of 4, = ee ex Up RI ZAR Let 
J' (£0) be the cost of 7’. Note that J? (£o) is equal to (7) 
(with k = 0) which in turn can be written as JO (£o) = 
xd Poxo+fo(€o, Do)+do Since, for 1 = 1, the trajectory is only 
optimized at time k = 0, J! (ĉo) = zd Poxo + fo (ĉo, Žo) + do. 
Due to (14), J! (ĉo) < J? (ĉo) for every ĉo. Since the tail 
trajectory of policy ¢ is fixed to \,-1(%-1), 

EEL 


A lle Wse)G + lekk 
k=0 


J (£o) = 


h—1 
+E[) lle- ollo + lek Elie +llen— Vs, ] 
k=ı 


u-1 
= El) lz- lso + lei (Elle 
k=0 
JE zl Pz, A Pte A,-1(%-1)) F d,] 
where u1" (£x) is given by (6) (corresponding to the trajectory 


associated with \,-1(,—-1)). Due to condition (15), and using 
the fact that x! Px, + f.(é.,0.(6.)) +d = lla- y(s.)IIS + 


Wee (GMA + ea Peer eis + fig i (E41, AD), 
u-1 

F (éo) > El) lle- V(se ld + ek (Eee 
k=0 


+ xl P,x, + flé w(E.)) z d,] 


= El) ller- rsa) + luk (elle + dost 
k=0 


+ aha Ppt + fist (E41, A(H))] = It (Eo). 


This implies that, for every €o, Jò" (ĉo) = JI” (&o) < 
JP (£0) < + < J! (£0) < J” (£0) = Jo” (£o, Do). 


Proof of Proposition 1 


Condition (14) holds since, for any £o, the starting value of 
Algorithm 1 is the fixed Po € V$, and the gradient search steps 
with line search can only lead to a cost reduction of f and also 
ensure that i € VÉ for every choice of £ € {1,2,3}. Note that 
Pe-1 € Vy_, implies Ag(H—1) € V£. Condition (15) holds 
since, for any €,_1, the starting value of Algorithm 1 at time 
k is Ak(ŬŪk-1), Ďk-1 = We (Ex—1) which belongs to w: and 
the gradient search steps with line search can only lead to a 
cost reduction of f and also ensure that % € V£. 
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